{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Combine predictors using stacking\n\n\nStacking refers to a method to blend estimators. In this strategy, some\nestimators are individually fitted on some training data while a final\nestimator is trained using the stacked predictions of these base estimators.\n\nIn this example, we illustrate the use case in which different regressors are\nstacked together and a final linear penalized regressor is used to output the\nprediction. We compare the performance of each individual regressor with the\nstacking strategy. Stacking slightly improves the overall performance.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "print(__doc__)\n\n# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>\n# License: BSD 3 clause"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The function ``plot_regression_results`` is used to plot the predicted and\ntrue targets.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n\n\ndef plot_regression_results(ax, y_true, y_pred, title, scores, elapsed_time):\n    \"\"\"Scatter plot of the predicted vs true targets.\"\"\"\n    ax.plot([y_true.min(), y_true.max()],\n            [y_true.min(), y_true.max()],\n            '--r', linewidth=2)\n    ax.scatter(y_true, y_pred, alpha=0.2)\n\n    ax.spines['top'].set_visible(False)\n    ax.spines['right'].set_visible(False)\n    ax.get_xaxis().tick_bottom()\n    ax.get_yaxis().tick_left()\n    ax.spines['left'].set_position(('outward', 10))\n    ax.spines['bottom'].set_position(('outward', 10))\n    ax.set_xlim([y_true.min(), y_true.max()])\n    ax.set_ylim([y_true.min(), y_true.max()])\n    ax.set_xlabel('Measured')\n    ax.set_ylabel('Predicted')\n    extra = plt.Rectangle((0, 0), 0, 0, fc=\"w\", fill=False,\n                          edgecolor='none', linewidth=0)\n    ax.legend([extra], [scores], loc='upper left')\n    title = title + '\\n Evaluation in {:.2f} seconds'.format(elapsed_time)\n    ax.set_title(title)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Stack of predictors on a single data set\n##############################################################################\n It is sometimes tedious to find the model which will best perform on a given\n dataset. Stacking provide an alternative by combining the outputs of several\n learners, without the need to choose a model specifically. The performance of\n stacking is usually close to the best model and sometimes it can outperform\n the prediction performance of each individual model.\n\n Here, we combine 3 learners (linear and non-linear) and use a ridge regressor\n to combine their outputs together.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from sklearn.ensemble import StackingRegressor\nfrom sklearn.ensemble import RandomForestRegressor\nfrom sklearn.experimental import enable_hist_gradient_boosting  # noqa\nfrom sklearn.ensemble import HistGradientBoostingRegressor\nfrom sklearn.linear_model import LassoCV\nfrom sklearn.linear_model import RidgeCV\n\nestimators = [\n    ('Random Forest', RandomForestRegressor(random_state=42)),\n    ('Lasso', LassoCV()),\n    ('Gradient Boosting', HistGradientBoostingRegressor(random_state=0))\n]\nstacking_regressor = StackingRegressor(\n    estimators=estimators, final_estimator=RidgeCV()\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We used the Boston data set (prediction of house prices). We check the\nperformance of each individual predictor as well as the stack of the\nregressors.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import time\nimport numpy as np\nfrom sklearn.datasets import load_boston\nfrom sklearn.model_selection import cross_validate, cross_val_predict\n\nX, y = load_boston(return_X_y=True)\n\nfig, axs = plt.subplots(2, 2, figsize=(9, 7))\naxs = np.ravel(axs)\n\nfor ax, (name, est) in zip(axs, estimators + [('Stacking Regressor',\n                                               stacking_regressor)]):\n    start_time = time.time()\n    score = cross_validate(est, X, y,\n                           scoring=['r2', 'neg_mean_absolute_error'],\n                           n_jobs=-1, verbose=0)\n    elapsed_time = time.time() - start_time\n\n    y_pred = cross_val_predict(est, X, y, n_jobs=-1, verbose=0)\n    plot_regression_results(\n        ax, y, y_pred,\n        name,\n        (r'$R^2={:.2f} \\pm {:.2f}$' + '\\n' + r'$MAE={:.2f} \\pm {:.2f}$')\n        .format(np.mean(score['test_r2']),\n                np.std(score['test_r2']),\n                -np.mean(score['test_neg_mean_absolute_error']),\n                np.std(score['test_neg_mean_absolute_error'])),\n        elapsed_time)\n\nplt.suptitle('Single predictors versus stacked predictors')\nplt.tight_layout()\nplt.subplots_adjust(top=0.9)\nplt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The stacked regressor will combine the strengths of the different regressors.\nHowever, we also see that training the stacked regressor is much more\ncomputationally expensive.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}